parameter symmetry
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries --- transformations of neural network parameters that do not change the underlying neural network function. These include linear mode connectivity, model merging, Bayesian neural network inference, metanetworks, and several other characteristics of optimization or loss-landscapes. However, theoretical analysis of the relationship between parameter space symmetries and these phenonmena is difficult. In this work, we empirically investigate the impact of neural parameter symmetries by introducing new neural network architectures that have reduced parameter space symmetries. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. With these new methods, we conduct a comprehensive experimental study consisting of multiple tasks aimed at assessing the effect of removing parameter symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries; for instance, we observe linear mode connectivity between our networks without alignment of weight spaces, and we find that our networks allow for faster and more effective Bayesian neural network training.
A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models
Equivariant neural networks have proven to be effective for tasks with known underlying symmetries. However, optimizing equivariant networks can be tricky and best training practices are less established than for standard networks. In particular, recent works have found small training benefits from relaxing equivariance constraints. This raises the question: do equivariance constraints introduce fundamental obstacles to optimization? Or do they simply require different hyperparameter tuning? In this work, we investigate this question through a theoretical analysis of the loss landscape geometry. We focus on networks built using permutation representations, which we can view as a subset of unconstrained MLPs. Importantly, we show that the parameter symmetries of the unconstrained model has nontrivial effects on the loss landscape of the equivariant subspace and under certain conditions can provably prevent learning of the global minima. Further, we empirically demonstrate in such cases, relaxing to an unconstrained MLP can sometimes solve the issue. Interestingly, the weights eventually found via relaxation corresponds to a different choice of group representation in the hidden layer. From this, we draw 3 key takeaways. (1) By viewing the unconstrained version of an architecture, we can uncover hidden parameter symmetries which were broken by choice of constraint enforcement (2) Hidden symmetries give important insights on loss landscapes and can induce critical points and even minima (3) Hidden symmetry induced minima can sometimes be escaped by constraint relaxation and we observe the network jumps to a different choice of constraint enforcement. Effective equivariance relaxation may require rethinking the fixed choice of group representation in the hidden layers.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries --- transformations of neural network parameters that do not change the underlying neural network function. These include linear mode connectivity, model merging, Bayesian neural network inference, metanetworks, and several other characteristics of optimization or loss-landscapes. However, theoretical analysis of the relationship between parameter space symmetries and these phenonmena is difficult. In this work, we empirically investigate the impact of neural parameter symmetries by introducing new neural network architectures that have reduced parameter space symmetries. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. With these new methods, we conduct a comprehensive experimental study consisting of multiple tasks aimed at assessing the effect of removing parameter symmetries.
Parameter Symmetry Breaking and Restoration Determines the Hierarchical Learning in AI Systems
Ziyin, Liu, Xu, Yizhou, Poggio, Tomaso, Chuang, Isaac
More and more phenomena that are virtually universal in the learning process have been discovered in contemporary AI systems. These phenomena are shared by models with different architectures, trained on different datasets, and with different training techniques. The existence of these universal phenomena calls for one or a few universal explanations. However, until today, most of the phenomena are instead described by narrow theories tailored to explain each phenomenon separately - often focusing on specific models trained on specific tasks or loss functions and in isolation from other interesting phenomena that are indispensable parts of the deep learning phenomenology. Certainly, it is desirable to have a universal perspective, if not a universal theory, that explains as many phenomena as possible. In the spirit of science, a universal perspective should be independent of system details such as variations in minor architecture definitions, choice of loss functions, training techniques, etc. A universal theory would give the field a simplified paradigm for thinking about and understanding AI systems and a potential design principle for a new generation of more efficient and capable models.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
Lim, Derek, Putterman, Moe, Walters, Robin, Maron, Haggai, Jegelka, Stefanie
Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries -- transformations of neural network parameters that do not change the underlying neural network function. These include linear mode connectivity, model merging, Bayesian neural network inference, metanetworks, and several other characteristics of optimization or loss-landscapes. However, theoretical analysis of the relationship between parameter space symmetries and these phenomena is difficult. In this work, we empirically investigate the impact of neural parameter symmetries by introducing new neural network architectures that have reduced parameter space symmetries. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. With these new methods, we conduct a comprehensive experimental study consisting of multiple tasks aimed at assessing the effect of removing parameter symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries; for instance, we observe linear mode connectivity between our networks without alignment of weight spaces, and we find that our networks allow for faster and more effective Bayesian neural network training.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.34)
Optimizing parametrized quantum circuits via noise-induced breaking of symmetries
Fontana, Enrico, Cerezo, M., Arrasmith, Andrew, Rungger, Ivan, Coles, Patrick J.
Very little is known about the cost landscape for parametrized Quantum Circuits (PQCs). Nevertheless, PQCs are employed in Quantum Neural Networks and Variational Quantum Algorithms, which may allow for near-term quantum advantage. Such applications require good optimizers to train PQCs. Recent works have focused on quantum-aware optimizers specifically tailored for PQCs. However, ignorance of the cost landscape could hinder progress towards such optimizers. In this work, we analytically prove two results for PQCs: (1) We find an exponentially large symmetry in PQCs, yielding an exponentially large degeneracy of the minima in the cost landscape. (2) We show that noise (specifically non-unital noise) can break these symmetries and lift the degeneracy of minima, making many of them local minima instead of global minima. Based on these results, we introduce an optimization method called Symmetry-based Minima Hopping (SYMH), which exploits the underlying symmetries in PQCs to hop between local minima in the cost landscape. The versatility of SYMH allows it to be combined with local optimizers (e.g., gradient descent) with minimal overhead. Our numerical simulations show that SYMH improves the overall optimizer performance.
- Europe > United Kingdom (0.28)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- Government > Regional Government (0.67)
- Energy (0.46)
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)